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CN Abstract 
> 

O In Kuznetsov et al. [28] a new Monte Carlo simulation technique was introduced for 

a large family of Levy processes that is based on the Wiener-Hopf decomposition. We 
CN pursue this idea further by combining their technique with the recently introduced multi- 

level Monte Carlo methodology. Moreover, we provide here for the first time a theoretical 
I— I analysis of the new Monte Carlo simulation technique in |28j and of its multilevel variant 

for computing expectations of functions depending on the historical trajectory of a Levy 
process. We derive rates of convergence for both methods and show that they are uni- 
form with respect to the "jump activity" (e.g. characterised by the Blumenthal-Getoor 
Cd index). We also present a modified version of the algorithm in Kuznetsov et al. [28] which 

^ combined with the multilevel methodology obtains the optimal rate of convergence for 

general Levy processes and Lipschitz functionals. This final result is only a theoretical 
T— I one at present, since it requires independent sampling from a triple of distributions which 

is currently only possible for a limited number of processes. 

QQ Key words and phrases: Wiener-Hopf decomposition, Monte Carlo simulation, mul- 

tilevel Monte Carlo, Levy processes, barrier options. 
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• Si 1 Introduction 

X 

^ A classical problem in mathematical finance deals with the ability to compute E[F(X)], where 
X := {Xs : s G [0,t]} is a stochastic process which models the underlying risky asset and F is 
a payoff function which may depend on the historical path of X. This is the typical setting for 
pricing a variety of exotic options in finance such as look back and barrier options. Starting 
with the early work of Madan and Seneta 130], a class of processes playing the role of X that 
have found prominence in this respect is the class of Levy processes. An extensive overview 
of their position in mathematical finance can be found in the books |6l |TTl [33l |3l]. More 
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recently Levy processes have also been extensively used in modern insurance risk theory; see 
for example Asmussen and Albrecher [2] and Kliippelberg et al. • In insurance mathematics, 
it is the Levy process itself which models the surplus wealth of an insurance company until its 
ruin. There are also extensive applications of Levy processes in queuing theory, genetics and 
mathematical biology as well as in stochastic differential equations (see e.g. [8l [TU UHl 123]). 

In both financial and insurance settings, a key quantity of generic interest is the joint 
law of the current position and the running maximum of a Levy process at a fixed time, if 
not the individual marginals associated with the latter bivariate law. Consider the following 
example. If we define Xt = sup^<^ Xg then the pricing of barrier options boils down to evaluating 
expectations of the form ]E[/(x + Xt)l^^^x^^f^y] for some threshold b > 0. Indeed if /(x) = 
{K — e^)"*" and K > 0, then the latter expectation is related to the value of an "up-and-in" 
put. In credit risk one is predominantly interested in the quantity F{Xt < x) as a function 
in X and t, where P is the law of the dual process —X. Indeed it is a functional of the latter 
probabilities that gives the price of a credit default swap not to mention the recently introduced 
financial instruments known as convertible contingencies (CoCos). See for example the recent 
book of Schoutens and Cariboni [M] as well as Corcuera et al. Il2\. One is similarly interested 
in F[Xt > x) in ruin theory, since these probabilities are also equivalent to the finite-time ruin 
probabilities; cf. Asmussen and Albrecher j2j. 

A widely used approach to compute expectations of functions depending on the historical 
trajectory of a Levy process over the time horizon, say [0,t], is to approximate the path by a 
random walk with n steps, each step covering t/n units of time, and therewith to perform a 
Monte Carlo (MC) simulation. Giles [IHl [IS] introduced an adaptation of the straightforward 
MC methodology, the multilevel Monte Carlo method (MLMC), which is especially suited to 
the scenario we are interested in, but in the case that X is a pure diffusion process. Very 
recently, there has been increasing attention to the MLMC method also in the setting of Levy 
processes, see Giles and Xia [20] for the jump diffusion setting, and Dereich [H] and Dereich 
and Heidenreich [T3] for more general Levy processes. Generally speaking all these methods 
share a common approach, which consists in constructing an embedded sequence of grids that 
are made up of a mixture of deterministic and random points. The random points in these 
grids deal with the large jumps of the Levy process and the deterministic points deal with the 
"small movements", that is to say, the diffusive part and/or the small jumps. 

In this paper we consider an alternative method based entirely on a random grid. In 
particular we shall introduce an adaptation of the MLMC method based on a very recently 
introduced technique for performing MC simulations that appeals to the so-called Wiener- Hopf 
factorisation for one-dimensional Levy processes. This last technique is called the Wiener- Hopf 
Monte Carlo (WHMC) simulation method and it was introduced in Kuznetsov et al. fIE\ . 
We will denote the coupling of these two techniques the multilevel Wiener-Hopf Monte Carlo 
method (MLWH). Our analysis will focus on the particular setting that X is a one-dimensional 
Levy process and F is a Lipschitz function which depends on the value of X and its past 
supremum at a fixed time t > 0. 

The main result in this paper shows that the root mean square error in the MLWH method 
converges in general with order 0{iy~^) for processes of unbounded variation and (9(z/^3) for 
processes of bounded variation. Unlike Dereich [H], Dereich and Heidenreich [15], and Giles and 
Xia [20] , our method is robust in the sense that the convergence rate does not exhibit dependence 
on the nature of the jump structure (in particular small jumps) of the underlying Levy process. 
Moreover, the implementation of the algorithm is extremely straightforward, requiring only to 
be able to independently sample from two distributions. This robustness comes at the price of 
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our algorithm not having optimal convergence rates, in particular when the underlying Levy 
process has paths of bounded variation. For Levy processes of unbounded variation we are able 
to derive regimes where our methodology performs better than the algorithms in Dereich |14j . 
Dereich and Heidenreich [15] and Jacod et al. [22] and, as far as we know, better than any other 
existing algorithm. We will also show the existence of another multi-level algorithm, which is 
a modification of the original algorithm in Kuznetsov et al. [28] , that achieves a convergence 
order 0{iy~^ \og^ u) independently of the jump structure. As shown in Creutzig et al. |13j . 
up to the logarithmic term this is in fact optimal for general Levy processes. Unfortunately 
however, this algorithm requires sampling from a triplet of dependent distributions for which 
there are currently very few known examples of Levy processes where this is possible. We shall 
elaborate further on this point later in this paper. 

As alluded to above, we shall benchmark our MLWH results against those of Dereich and 
Heidenreich [15] and Dereich [Tl]. Note that, even though these two papers consider the more 
general setting of expectations of path functionals of solutions to SDEs driven by general 
Levy processes, the scenario we consider here is permitted within their framework and we are 
currently working on extending our methodology to the more general setting. 

The paper is organised as follows. In the next section we will review the general setting for 
the Wiener-Hopf factorisation of Levy processes and describe the WHMC method introduced 
in Kuznetsov et al. [28]. Thereafter, in Section |3| we describe how an adaptation of the 
WHMC method can be used in the context of multilevel simulation, that is to say, we formally 
introduce the MLWH method. Section |4] is devoted to the numerical analysis of the one level 
(WHMC) and the multilevel (MLWH) method, in particular, giving mathematical justification 
to our claims concerning the performance of the MLWH method above. Moreover, we discuss 
how the MLWH method performs against other approaches that have been mentioned. Section 
[5] will introduce a theoretical variation of the original MLWH, which we can show achieves 
optimal convergence rates but which, unfortunately, cannot yet be implemented in practice until 
more substantial examples of the Wiener-Hopf factorisation are discovered. Finally, Section |6] 
provides some implementation details and numerical computations for a representative example 
which support the previous theoretical derivations and confirm the feasibility of the approach. 



2 Preliminaries 

2.1 Levy processes and the Wiener-Hopf factorisation 

We begin by briefly reviewing the definition of a one-dimensional Levy process and its associated 
Wiener-Hopf factorisation. For a more in-depth account we refer the reader to the monographs 
of Bertoin [3], Kyprianou [22] or Sato Note that the Wiener-Hopf factorisation only exists 
for one- dimensional Levy processes. 

Recall that a one- dimensional Levy process with law P, henceforth denoted hj X := {Xt : 
t > 0}, is a stochastic process issued from the origin which enjoys the properties of having 
stationary and independent increments with paths that are almost surely right-continuous with 
left limits. It is a well understood fact that, as a consequence of this definition, the law of 
every Levy process is characterised through a triplet (a,cr, H), where a G M, a > and H is a 
measure concentrated on M\{0} such that J^{1 A a;^)H(dx) < 00. More precisely, for all t > 0, 

E[e^^^*] =e-**(^), for all 9 eR, 
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where 

^(9) = ia9 + \a^e'^ + / (1 - e'^^' + i^xl(|^|<i))n(dx) 

is the so-called characteristic exponent of the process. 

A property that is common to all Levy processes is the so-called Wiener-Hopf factorisation. 
Suppose that for any g > 0, e(g) is an exponentially distributed random variable with mean 
q^^ that is independent of X. Recall that Xt = sup^<^Xs and let 2Lt '■= infs<iXs. The Wiener- 
Hopf factorisation states that the random variables Xe{q) and Xe{q) — Xe(q) are independent. 
Thanks to the so-called principle of duality, that is to say the equality in law of the pair 
— Xt : < s < t} and {—Xs : < s < t}, it follows that Xe{q) — -^e(g) is equal in 
distribution to —X^i^^y This leads to the following factorisation of characteristic functions 

E(ei^^e(,)) = E(e'^^-(«)) X E(e^^^-(«)), for all ^ G M, 
known as the Wiener-Hopf factorisation. Equivalently, 

^e(q) = Sq + Ig, (1) 

where Sq and Iq are independent and equal in distribution to Xe(g) and 2Le{q): respectively. 

Here we use the notation = to mean equality in distribution. 
We will work under the following assumptions: 

(Al) a;2H(dx) < oo, 

(A2) the payoff function F : M x ]R+ — )• M is a Lipschitz function with Lipschitz constant 
assumed to be 1 for simplicity. 

(A3) the computational time to sample from 2Le{q) and Xe{g) is independent of the value of q. 

The first assumption asks for much less than what is commonly accepted in the financial 
literature, where typically exponential moments of the truncated Levy measure are required. 
Note that this condition ensures that for each t > 0, Xt has a finite second (and hence also a 
first) moment. 

The second and third assumptions are equivalent to the corresponding conditions imposed 
in Dereich and Heidenreich [15] and Dereich [H]. A justification for the third assumption in the 
case that X belongs to the so-called /3-class of Levy processes is provided in Section |6| As we 
shall discuss in more detail below, the /3-class is a large family of Levy processes which is both 
widely suitable for use in financial modelling, as well as for implementation of our algorithm. 

2.2 The Wiener-Hopf Monte Carlo method 

Due to the independent increments of Levy processes, the most common approaches to the 
Monte Carlo simulation of expectations involving the joint law of {Xt,Xt) work with random 
walk approximations to the Levy process. This requires one either to be able to simulate the 
increments of the Levy process exactly for a fixed time step or to be able to suitably approximate 
the Levy process, typically by a jump diffusion process. 

Recently introduced by Kuznetsov et al. |2H] , the so-called Wiener-Hopf Monte Carlo method 
is related to the simulation of increments. However, it does not work with a fixed deterministic 
grid, but instead requires the underlying grid to be random with independent and exponentially 
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distributed spacings (i.e. the arrival times of a compound Poisson process). This allows us to 
simulate the paths of a very large class of Levy processes and thus to sample from the law of 
{Xt,Xt) where r is a random time whose distribution can be concentrated arbitrarily close 
around t depending on a parameter chosen in the algorithm that controls the resolution of the 
random grid and thus the amount of work. 

To describe the WHMC method in more detail, let us suppose that ei(l),e2(l), ■ ■ ■ are a 
sequence of i.i.d. exponentially distributed random variables with unit mean. The basis of the 
WHMC algorithm is the following simple observation, which follows directly from the Strong 
Law of Large Numbers. For all t > 0, 

t 

^ -ej(l) t as n t oo (2) 

i=l ^ 

almost surely. Note that the random variable on the left hand side of ^ can also be written 
as the sum of n independent random variables with an exponential distribution having mean 
t/n and therefore is equal in law to a Gamma random variable with parameters n and n/t, 
henceforth written as g{n,n/t). For sufficiently large n, Kuznetsov et al. [28] argue that a 
suitable approximation to P(Xt G dx, Xt G dy) is P(Xg(„ „/t) G da;, X^(^n,n/t) £ dy). Indeed, it 
is a triviality to note that, thanks to ([2]) and the independence of g{n,n/t) from X, the pair 
(Xg(„ Xg(„^„/()) converges almost surely to {Xt,Xt) as n ^ oo. We will compute rates of 
convergence in Section |4j 

The following theorem is straightforward to prove using ([T]) together with the stationary 
and independent increments of the underlying Levy process. 

Theorem 2.1 (Kuznetsov et al. [281 Thm. 1]) Let {S^^^ ■ J > 1} and {P^^ : J > 1} be 
i.i.d. sequences of random variables with common distribution equal to that of Xe(n/t) and 
K.e{n/t)> respectively. Then, for all n eN, 

{Xg(n,n/t),Xg(^n,n/t)) = {V {u, U / 1) , J {fl, Tl/t)) , 

where, for any G N, and setting V{0, n/t) := we define 

V{k, n/t) = V{k - 1, n/t) + (^^/, + 4^/,) , (3) 

k 

J{k,n/t) = \J [V{j~l,n/t) + SQ. (4) 
j=i 

It is clear from earlier remarks on the convergence of (-^g(n,n/i); -^g(n,n/t)) that the pair 



(y{n,n/t), J{n,n/t)) converges in distribution to (Xt,Xt). Theorem 2.1 suggests that as soon 
as we are able to simulate i.i.d. copies of the distributions of Sn/t and In/t, then by the 
simple functional transformations given in (§ and g, we may produce an exact draw from 
the distribution of {Xg(^n,n/t), ^ g{n,n/t)) ■ Moreover, for a suitably nice function F, using stan- 
dard Monte Carlo methods based on the Strong Law of Large Numbers, one may estimate 

'^{F{^s{n,n/t), X g(^ri,n/t))) by 

M 



i=l 

where F"'*^*) is the i-th sample of 



:= F{V{n,n/t),J{n,n/t)) . 
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Indeed, we have liniMtoo -^mc^ ~ ^ {F'{Xg(^n,n/t),^g(n,n/t))) almost surely, which in turn con- 
verges to E{F{Xt, Xt)) as n t oo. 

As alluded to above, the WHMC method is numerically feasible only if samples from the 
distributions of Sn/t and In/t are available. Until recently, this would have proved to be a 
significant stumbling block on account of there being few examples for which the aforesaid 
distributions are known in explicit form. However, developments in Wiener-Hopf theory for 
Levy processes in the last couple of years (see for example Kuznetsov [26] or Kuznetsov et al. 
[23 [2H]) have provided a rich enough variety of examples for which the necessary distributional 
sampling can be performed. This family of processes are named meromorphic Levy processes in 
Kuznetsov et al. [271 i2H] • One large subfamily of such processes is the /5-class of Levy processes, 
which also conveniently offers all the desirable properties of better known Levy processes that 
are used in mathematical finance, such as CGMY processes, VG processes or Meixner processes; 
see for example the discussions in Ferreiro-Castilla and Schoutens [ITj and Schoutens and van 
Damme |35j . 



2.3 The WHMC method in context 

Let us now spend a little time contextualising the WHMC method for simulating the path of 
a Levy process against related possible alternative schemes, pointing out in particular where 
we should expect to see improved efficiencies in working with the Wiener-Hopf factorisation. 
A general point of reference for further reading is the book of Cont and Tankov [TT] . 

Generally speaking, there are three alternative approaches to simulate the path of a Levy 
process. The first one relies on the ability to simulate the increments of the Levy process exactly 
for a fixed (deterministic) time step and therefore construct a random walk as discussed at the 



beginning of Section 2.2 This method requires knowledge of the distribution of the Levy process 
at a fixed time, either through an exact analytical formula, or via numerical inversion of the 
characteristic function. There are also general results on the simulation of infinite divisible 
distributions, for which the distribution of a Levy process at a fixed time is an example, see 
e.g., Bondesson [5]. As mention earlier, approximating a Levy process by an embedded random 
walk may introduce significant errors on path functionals of X such as the running maximum 
which is of prime interest in applications we have in mind (cf. Broadie and Glasserman JTj). 

The second approach is to use a time-dependent infinite series expansion to approximate 
the value of the Levy process at each fixed time. A general result in this direction is found in 
Rosihski [31] for the case that an explicit expression for the Levy measure is known in closed 
form. The aforesaid series representation converges uniformly and almost surely in any compact 
set of time. This lends itself better to sampling path functionals of the process than, perhaps 
the random walk approximation but it might make the numerical analysis difficult. 

Finally, the third and most common approach is to approximate X via a jump-diffusion 
process; that is to say a Levy process which can be written as a linear Brownian motion plus 
an independent compound Poisson process. This is done by truncating the Levy measure, 
removing all small jumps below a certain threshold in magnitude and compensating for their 
removal by making an appropriate adjustment to the linear and/or Gaussian component. The 
truncation of small jumps ensures that the remaining jumps conform to a compound Poisson 
structure and hence one is left with simulating the path of a linear Brownian motion, interlaced 
with jumps distributed according to the normalised truncated Levy measure arriving at an 
appropriate Poissonian rate. The method of truncating small jumps, originally due to Asmussen 
and Rosihski [3], is an obvious approach (cf. Cont and Tankov PJ4 Sect. 6.3] and the references 
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therein) and it is the approach taken by Dereich and Heidenreich [15] in their design of multilevel 
methods for Levy-driven stochastic differential equations. There it is observed that when the 
jump component of X is of finite variation, one may reasonably replace the small jumps by 
a linear trend. Further, in the case of an unbounded variation jump component it is seen 
that a more appropriate approximation is to replace the small jumps by a Gaussian process. 
This truncation method is also the approach used in Dereich |T3] . Whilst |15l [H] successfully 
demonstrate the convergence of this truncation method, there are limitations; see, for example, 
the discussion in Asmussen and Rosihski P]. 

The main differences between these three methods and the WHMC simulation scheme are 
three-fold. Firstly, whilst the the WHMC scheme is restricted to Levy processes that allow 
sampling from the distributions of the variables X^^q) and ^e(g)) is otherwise indifferent 
to the jump structure of the underlying Levy process. Indeed, the approximation of a Levy 
process by a compound Poisson process, which is the most popular approach, relies on the 
Levy-Ito decomposition. The latter decomposes the process with respect to the endogenous 
jump structure. In contrast, the Wiener-Hopf factorisation concerns the decomposition of the 
path of a Levy process according to the distribution of its maximum. Secondly, the WHMC 
method relies entirely on a randomised time grid. Note for comparison that the approach in 
[T^ |T3] requires a deterministic grid interlaced by a random grid capturing the large jumps. 
Thirdly, the WHMC works on the principle of sampling the Levy process over the "wrong" 
time horizon (but in a way which can be made arbitrarily close to the desired time horizon) 
purely in order to be able to sample from the exact maximum. Note that a similar idea of 
randomising the time horizon in a computation involving the expectation of a functional of the 
path of a linear Brownian motion that appears in the context of pricing American options is 
described in Carr [9\. We shall also see in the forthcoming numerical analysis that this idea 
of sampling over the "wrong" time horizon turns out to make the numerical analysis of the 
WHMC method, as well as its multilevel version, relatively straightforward. Indeed the rate 
of convergence of the WHMC and of the MLWH method can be expressed directly in terms of 
the rate of convergence of the randomised time horizon to the fixed time t. 

Since the Wiener-Hopf technique naturally leads to a random walk with exponential time- 
spacings that will on average shrink as n oo, the WHMC and the MLWH methods are in 
principle extendable to more general problems than the one considered here, for instance to 
approximate solutions of Levy driven SDEs (as done in [121 [Ti]). 

3 Multilevel Wiener-Hopf Monte Carlo simulation 

As noted already in the context of pure diffusion processes by Giles [IHl HH] and others, optimal 
convergence rates in simulating from random processes are only achievable by using multiple 
levels of time grids. This has its roots in multigrid techniques for deterministic differential 
equations. It relies on the fact that large features or longtime trends of a process can be 
approximated on a coarse time grid to a sufficient accuracy that can be related clearly to the 
time step size. Smaller features are then added as corrections to this longtime trend on a finer 
time grid. This can be done without introducing additional bias, and the benefit is that the 
variance of this correction on the finer grid is substantially smaller than the variance of the 
original process on the same grid, leading to a variance reduction and thus a smaller number 
of Monte Carlo samples, necessary to achieve a prescribed absolute tolerance. This variance 
reduction stems from the fact that our time discretisation scheme is a convergent process and, 
provided the convergence rates are known, the variance reduction, and thus the cost of the 
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method, can be rigorously quantified. Applying this idea recursively on a sequence of ever finer 
grids, allows the simulation of the entire process with all small features in optimal computational 
complexity for any prescribed tolerance of the bias and the standard deviation. In more general 
terms, we are simulating from a random process by combining large numbers of very coarse and 
cheaply available paths with small numbers of fine paths that are expensive to compute. We 
refer to Giles fl8\ [T9] and Cliffe et al. |T0] and the references therein for a detailed overview of 
the method, as well as to Giles and Xia pO], Dereich and Heidenreich [15] and Dereich |14j for 
applications to simulating jump diffusions and Levy processes. Our interest here is to construct 
a multilevel version of the WHMC method. 

3.1 Definition of the multilevel method 

Since the multilevel method relies on variance reduction, it is worth recalling the formula for 
the mean square error of the WHMC method for subsequent reference. The mean square error 
between the estimator and K[F{Xt,Xt)] is 

erec"")' := E[{F;:i^-E[F{X,,X,)]r]- 

Since E[F;^c] = E[F"] and Y[F;^c] = M-iV[F"], where ¥[■] denotes variance, it can be 
decomposed as follows: 

= M-iV[F"] + {E[F^ - F{Xt,X,)])\ (6) 

The first term in the above decomposition is the variance of the WHMC simulation and the 
second one is the bias of the approximation induced by the randomised time horizon. Recall 
that converges in distribution to F{Xt, Xt) as n f oo. We shall say that F"" becomes a finer 
approximation of F{Xt, Xt) the larger n becomes. Conversely, the approximation F" is said to 
become coarser the smaller the value of n. 

The MLWH method starts from the simple observation that we can write the expectation 
of the finest approximation F"^ as a telescopic sum starting from a coarser approximation F"°, 
as well as intermediate ones: 

L 

E[F"^] = E[F"«] + ^ E[F"^ - , 1 < no < Ui < . . . < ul ■ (7) 

e=i 

A typical choice for ne/no, i = 0, . . . ,L, are powers of 2, or more generally of some integer 
s G N, i.e. n£ := nos^, for some no G N. For the remainder of this paper, it suffices to simply 
think of n£ := no2^, for i = 0,...,L. The MLWH method now consists in independently 
computing each of the expectations of the telescopic sum by a standard Monte-Carlo method, 
i.e. 

Mo L Me 

^ i=l i=l ^ i=l 

where Ai{nQ, L) := {Mi}^^^ . Analogously to ([6|, we can again expand the mean square error 
for the multilevel estimator to obtain 
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Note that the bias term remains the same, i.e. we have not introduced any additional bias. 
However, by a judicious choice of Ai{nQ, L) it is possible to reduce the variance of the estimator 
(i.e. the first two terms) for any pre-chosen computational effort, or conversely reduce the 
computational cost for any pre-selected tolerance of the variance of the estimator. 

The variance reduction stems from the fact that converges to F{Xt, Xt) in mean square. 
This is equivalent to the mean square convergence of the Cauchy sequence F^' — F^^-^, and 
thus VfF"* — — )■ 0. This implies that the contribution to the total variance from the finer 
levels (where the cost per sample is high) is decreasing with — )■ oo, allowing us to use only 
very few of these expensive samples, while on the coarser levels the cost of samples is smaller 
anyway. At this point in time, we have not yet established rigorously that converges to 
F{Xt, Xt), not to mention at what rate. This will be dealt with in Section [4j However, let us 
assume mean square convergence for now. 

3.2 Thinning a Poisson random grid 

Before continuing, it is first necessary to elaborate on another important point. One of the 
defining features of the multilevel method is that both approximations of the payoff function 
in simulating from the random variable (F"^ — F"^^-^) should come from the same draw. This 
means that once the increments of the Levy process have been sampled to obtain a draw for F"*^, 
a deterministic transformation of the sample points and of the increments should take place to 
obtain a sample for F^^^-^ (or vice versa). While this procedure is clear in the original setting 
of Giles [in], where a diffusion processes is approximated by a random walk on a deterministic 
grid, it is not entirely trivial how to proceed in the case of a general Levy process and with 
respect to random grids. 

Our approach to construct a sample for a coarser approximation from the draw of a finer 
approximation relies on the technique of Poisson thinning. A similar approach was considered 
in Giles and Xia [20] in the case of jump diffusion processes and can be traced back further 
(also for general Levy processes) to, e.g., Glasserman and Merener |21j . 

The first step to sample from (F"^ — F'^'^-i) on level i is to construct a Poisson random 
grid with rate ni/t, i.e. the inter-arrival times are independently distributed as an exponential 
random variable with mean t/n£. Fix £ G N and t > and let A^^ := {N^ '■ s > 0} denote a 
Poisson process with arrival rate rii/t; the arrival times in are denoted by {T^'^ : ^ > 0}, 
with Tq'^ = 0. Recall that, without loss of generality, we restricted to the case where n^/n^-i = 
2 and notice that under this notation we have g{n£,n£/t) = T^f- It is a well established fact 
(cf. Kingman [21]) that if we censor arrivals in the process A^^ by tossing independent fair 
coins at each arrival and ignoring the arrival if the coin lands, say, on heads, then the resulting 
point process in time has the same law as N^^'^. In particular the inter-arrival times in this 
coarser grid are independent and exponentially distributed with mean t/n£_^i. The general case 
n£/ni_i = s, for some s > 1, could clearly be treated in an analogous manner by tossing a 
biased coin with probability of acceptance 1/s. 

To make the construction more precise, let us return to the case s = 2 and consider the 
sequences {S^^j^ : J > 1} and {1^^/^ : J > 1} of i.i.d. random variables with common distri- 
butions Xe(nt,/t) and X_f,(^ne/t) respectively, where the exponential periods are taken from the 
Poisson process A^^. Suppose we set kq = 0, and for i > 1, we write for the index of the i-th 
accepted arrival in the process A^^ after censoring. Note that for i > 1, Hi — Ki_i are a sequence 
of i.i.d. geometrically distributed random variables with parameter one half. Note also that, 
for i > 1, the time that elapses in the process A^^ between the {i — l)-th (after censoring) arrival 
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and the i-th (after censoring) arrival is equal in distribution to precisely 

j=Ki-l + l 

It is a straightforward computation to show that the above random variable is exponentially 
distributed with mean t/rii^i, e.g. by computing its moment generating function. 

Thanks to the Poisson thinning, we can now construct the independent sequences {5*^^ : 
z > 1} and {/^^ ^/t '■ i ^ 1} of random variables, which are needed for sampling from via 
a deterministic transformation of the sequences {S^^^^^ : J > 1} and {Ing/t • i — 1} follows: 

si../t = " V {sz}^' + Ca^O + ^-A^'l (11) 
k=i I j=i ) 

j = Ki_l + l 

Although it is now clear how to construct the infinite sequence of pairs -^n^.i/i) ■ ^ — 1} 

from {{Sl^^n^ In^jt) : J > 1}, some explanation is still needed when dealing with finite sequences, 
such as those required by the MLWH method. When the Poisson random grid is stopped at the 
time horizon g(n£, n^/t), it will in general be necessary to first extend the number of exponential 
periods from the process A^^ beyond g{n£,ne/t) before thinning to get the corresponding i — 1 
level grid. This is because we cannot ensure that after thinning, the total non-censored points 
in A^^ up to g{ni,n£/t) sum up to n^_i. 

Let us point out here another issue directly related to the fact that we are constructing our 
algorithm based on a completely random grid. Denote by F"^-! a sample produced from F^^ 
by the thinning methodology described above. Since the telescopic sum in ([T]) has to cancel 
out, i.e. we do not want to introduce extra bias, it is important that the expectation of F^^-'^ is 
the same as the expectation of This is called the consistency of the multilevel algorithm. 

Thanks to the thinning theorem this is straight forward to check in the above construction, 
however it will play a role in Section |5| Indeed, the above description asserts that thinning a 
Poisson process A^^ will lead in distribution to a Poisson process of rate n£_i/t and is thus equal 
in distribution to N^~'^. Therefore the random time g(n^_i, n£_i/t) has the same distribution 
as the resulting thinned version from g{n£,ne/t), say g{ng^i,ne-i/t). Finally, due to the fact 
that g(ra^_i, n^-i/t) and g(n£_i, n£_i/t) are independent of the underlying Levy process the 
consistency follows as and have the same law. 

4 Analysis of Wiener-Hopf Monte Carlo simulation 

Henceforth we shall use the following notation. We will write a < 6 for two positive quantities 
a and b, if a/b is uniformly bounded independent of any parameters, such as t, L, {ne}f^Q, or 
{M^I^^Q, as well as of any parameters describing the underlying Levy process. We will write 
a ~ 6, if a < 6 and b < a. 

4.1 Abstract convergence theorems 

There are two ways to quantify and compare the complexity of algorithms. We can either specify 
a target accuracy e and then estimate the cost to achieve an error below this target accuracy 
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with a particular method, or conversely, we can analyse the convergence of an algorithm as 
a function of the cost. The results are easily interchangeable. The first approach is the one 
chosen in Giles [12] and Cliffe et al. [TU]. However, Dereich and Heidenreich [TJ] and Dereich 
[Ti] chose the second approach and since we will mainly compare our results with theirs, we 
will also choose this approach here. To fix notation, computational cost is measured in floating 
point operations here (but it could equally be CPU-time). Thus, for the following let us assume 
that the expected value of the computational cost C{F) to compute a particular estimator F 
of K[F{Xt,Xt)] is bounded by u, i.e. 



E[C{F)] 



< 



Then we want to bound the root mean square error e{F) in terms of u. 

Let us start by analysing the single-level WHMC method. The following result can be easily 
deduced from ([6]). Since the cost per sample in the WHMC method is itself a random variable, 
we include a short proof. 

Theorem 4.1 Let t > and let us assume that converges in mean square to F{Xt,Xt). 
Suppose further that there exist positive constants a, 7 > such that 

(t) |E[F"-F(Xt,X,)]| <n-", 

(11) E[C„] < n\ 

where Cn represents the cost of computing a single sample of F"' on a Poisson grid with rate n/t. 
Then, for every i/ G N, there exist n, M G N such that 



E 



r I rpri,M 



< u and e (Fm'^I < u-^+^ 



1 



Proof. Assuming for the moment that V[F"] is bounded independently of n, then balancing 
the two terms on the right hand side of (|6]) and using assumption (i), we see that we should 
choose M ~ n^". Now, since 

we can deduce from assumption (ii) that the expected value of the total cost of the estimator 

1 2a 

will be bounded by z/, if we choose n ~ z/2a+7 and M ~ z/2"+7, which leads to the required 
bound for ei^F^^^. 

It remains to bound V[F"]. First note that trivially V[F"] < E[(F")2] and so 

^ V[F"] < E[(F" - F{Xt,Xt)f] + m[{F{XuXtf] (13) 

The first term can be bounded independently of n using the Lipschitz continuity of F in 
Assumption (A2) and our assumption that converges in mean square to F[Xt^Xt). The 
second term is bounded due to the Lipschitz continuity of F and Assumption (Al). □ 

Thanks to the general considerations in Giles |19] and Cliffe et al. [TOl Theorem 1], the 
analysis of the multilevel version follows in the same way. The proof of the following result is 
similar to that in piP, Appendix A] , with the same modifications as above to deal with random 
computational cost. 
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Theorem 4.2 Let t > and ng = hqs^, for some i,nQ and s > 1, and suppose that there 
are positive constants a, /3, 7 > with a > A 7) such that 

(t) \K[F^^-F{Xt,Xt)]\<n-^ 

(ii) V(F"* - < 

(iii) E[C„J < n], 

where Cn represents the cost of computing a single sample of F"^ on a Poisson grid with rate n/t. 
Then, for every z/ G N, there exists a value L and a sequence Ai{no,L) = {M^j^^Q such that 



E 



C \ F' 



ML 



< 



and 



e F 



;M{no,L) 



ML 



< 



1 

u 2 , 

1 



if /3 > 7 , 

u 2 log^ z/ , if j3 = -f, 

1 

^ 2+(7-/3)/(. ^ if f3 < ■J . 



In order to apply Theorems and 4^ now, we need to find suitable a, (3 and 7 such 
that the necessary assumptions are satisfied. Let us start by noting that Assumption (A3) in 
Section [2] implies that for both the single and the multilevel algorithm we always have 7 = 1. As 
we will verify in the next subsection, the values of a and /3 in the other hypothesis of Theorems 



4.1 and |4.2 are directly related to the convergence of the time horizon g{n,n/t) to t. 



4.2 Explicit convergence rates 

Let us write for convenience = {Xt,Xt), for t > 0, and let r and r' be any non-negative 
random variables that are independent of X (but potentially correlated). Then, thanks to the 
Lipschitz assumption (A2) on F, it is straightforward to deduce that 



and 



|E[F(X,)-F(XO]| < E[\X^-Xt\] < E[\X^-Xt\] + E[\X^-Xt\] (14) 



E[{X^-X^,f]).{15) 



V(F(X,)-F(X,0) < E[(X,-X,0 ] < 2iE[{X^-X^ 



To estimate the quantities on the right hand sides of (14) and (15) we will make use of the 
following lemma. 

Lemma 4.3 Suppose that X is a Levy process which satisfies Assumption (Al). Let us denote 
by T any non-negative random variable, independent of X . Then, for any fixed t > 0, 



(t) E[{X^ - Xt 



N[X,]E[\T-t\] + {E[X,]fE[{T-tf], 



(11) E[{X^-Xtf] < 16V[Xi]E[|r-t|] + 2(E[Xi] V 0)^) E[(t - t)^]. 
Proof, (i) Due to the fact that increments of a Levy process are i.i.d. we can write 



Xt — Xt 



X^.t , if r > t, 
-Xt-r , if r < t. 



Since X and r are assumed to be independent and since E[X^] = V[Xi] s + E[Xi] s^, for all 
s > 0, this implies 



E[(X, - Xt) 



V[Xi]E[|t - t\] + (E[Xi])^E[(r - tf] . 
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(ii) To obtain the result for the mean square error of the supremum process we first note the 
following equality in distribution 



where s < t and X^_^ is independent of {Xg' s' < s} and identically distributed to Xts- 
Taking account of the duality property for Levy processes, which tells us that is equal in 
distribution to X, — Xc, we have that 



d 



X, - X, ^ V (X'i + J < x,_,, 
where X" is independent of X^_g and equal in distribution to X^. It is now easy to check that 

E[(X.-X,f]<E[X;,_,|]. 

Let us decompose X into its martingale part X* := {X^* '■ t > 0} and a drift, i.e. Xt = 
X^ + tE[Xi], for t > 0. Due to Assumption (Al) this is always possible. Then we have 

E[X'] = E[sup(X; +E[Xi]s)2] 

s<t 

< E[(X; + t(E[Xi] VO))'] 

< 2(^E[(X;)2] + (E[Xi] V0)2t2^, (16) 

where X^ = supg<jXj'. Appealing to Doob's martingale inequality (see e.g. Sato [32, p. 167]) 
we end up with the inequahty 



E[X,] < 2 (^8E[(X;)'] + (E[Xi] V 0)' r J , (17) 

from which we may now write (again remembering that r is independent of X) 

E[X;,_,|] < 16E[(X|;_,|)2] + 2(E[Xi] V 0)2E[(r - tf]. 
Since E[[X*^_^Y] = E[(X* — Xj*)^], we may use the conclusion of part (i) and finally write 

E[X|^_,|] < 16E[(X; - X;)2] + 2(E[Xi] V 0)2 E[(r - t)^] 

= 16V[X*] E[|t - t\] + 2(E[Xi] V 0)2) E[(r - t)^], 

which concludes the proof since V[Xi'] = V[Xi]. □ 



Using this lemma we can now verify the remaining hypotheses in Theorems 4.1 and 4.2 and 
in particular establish the values for the parameters a and /3 in the convergence bounds for the 
WHMC and MLWH methods. 

Proposition 4.4 Let t > and X satisfies Assumption (Al). Then, for any n ^N, we have 
(i) E[(Xg(„,„/,) - Xtf] ^ and E[|Xg(„,„/i) - X^l] < n-^l\ 

(%t) E[(Xg(„,„/t) - Xtf] < and E[|Xg(„,„/i) - X^l] < n-^'\ 
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//, in addition, X has paths of bounded variation then we have the sharper bound 
E[|Xg(„,„/i)-Xi|]<n-i/2 E[|Xg(„,„/i)-Xi|]<n-i/2_ 

Proof. It suffices to prove tlie results for the second moments in (i) and (ii). The results for 



the first moments then follow from Jensen's inequality. With a view to applying Lemma |4.3 
we will show that for all n G N 



EMn,n/t)-tY] 
E[|g(n,n/t)-t|] 



t' 



n 



n 



2te~ "— ~ n 2 
n! 



(18) 
(19) 



the gamma distribution g{n,n/t). For equation (19), it is clear that 



where the final equivalence in (19) follows from Stirling's formula. 

First note that, since E[g(n, ra/t)] = t, equation (18) is just the formula for the variance of 
distribution g{n, 

E[\g{n,n/t)-t\] 

Jo Jt 

= 2 / (t- s)^(n,n/t,s)ds, (20) 
Jo 

where g{k,6,s) = r{k)~^6^s'''^e~^^ . Recall the definition of the incomplete gamma function, 
7(fc,-u) := x^~^e~^dx, so that 







',(*.o..)d. = i||) 



and 



We can now rewrite the expression (20) as 
ft 



(t — s)g{n, n/t, s)ds 







sg{k,e,s)ds = ^-^'^±^ 



g{n,n/t,s)ds — I sg{n,n/t, s)ds 



'-yin, n) 7(n + 1, n) 
Tin) Vin + l) 



(21) 



From the representation of 7(A;, u) in, for example, Abramowitz and Stegun [1, Section 6.5.13]), 
we can write for = 1, 2, ■ ■ ■ , 



V{k) T{k) 7o 



X 



fc— 1„— X 



U 



k-l 



which, when substituted into (21), together with (20) gives the equality (19). 



If X has paths of bounded variation then we can return to the proof of Lemma 4.3 



m 



particular the estimates in (16) and (17), and use similar computations together with Doob's 
martingale inequality for absolute moments (cf. Sato [32l p. 167]) to deduce that 

E[|Xg(„,„/i)-Xt|] < E[X;(„,„/,)_,|] + (E[Xi]VO)E[|g(n,n/t)-t|] 



< 8Ef|X, 



|g(n,n/t)-t|l 



+ (E[Xi] VO)E[|g(n,n/t)-t|]. 



(22) 



Note also that E[|Xg(„„/t) — Xt\] = E[\X\g(^n,n/t)-t\\] is essentially bounded by a linear combina- 
tion of I] and E[|g(n, n/t) — t\]. Taking account of (19), we therefore see that the 

proof is complete as soon as we show that lE[|X|g(.^ ^^^-)_^| |] < n'^i^ 
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To this end, note that every Levy process of bounded variation can be written as the 
difference of two independent subordinators. Accordingly we shall write, for any t > 0, = 
X[ — X'l . Assumption (Al) ensures that both X' and X" have finite first and second moments 



at all fixed times. Therefore, using a similar derivation as above, as well as (19), we have 



^[l^|*g(n,n/t)-t|l 



< E[-^|g(n,nA)-t|] 

= E[|g(n,n/t)- 



t|]E[|x;i + |xri] 



n 



-1/2 



□ 



We are now ready to state the main result of the paper which gives the convergence rates 
of the WHMC and the MLWH method. 

Theorem 4.5 Suppose that assumptions (Al)-(A3) are satisfied. Then, the hypotheses in 
Theorems 4-i and 4-S hold with a = 1/4, /3 = 1/2 and 7 = 1. In the case that X has paths 



of bounded variation we have a = 1/2. Thus, for any G N and under the constraint that the 
expected value of the total computational cost is 0{v) operations, the root mean square errors 
for the single and multilevel methods are 



n,M\ 
MC J 



< 



V 



V 



and 



,pM{no,L). < 



if X is of unbounded variation, 
if X is of bounded variation. 

if X is of unbounded variation, 
if X is of bounded variation. 



(23) 



(24) 



Proof. This follows immediately from Theorems 4.1 and 4.2 Proposition 4.4 equations (14) 



and (15) and the fact that 7 = 1 (cf. Assumption (A3)). 



□ 



Remark 4.6 Note that Assumption (A2) implicitly transforms the weak error estimate in 



Theorems 4.1 and 4.2 



into a strong error estimate through inequality (14). This means that 



by using Proposition AA_ we only get a lower bound estimate of the asymptotic parameter a. 
By noting that the time horizon g{n,n/t) is an unbiased estimator of t one easily sees that 
|E[Xg(„„/f) — Xt]\ = 0. Although this does not hold for the supremum process in general, 
for particular cases and under regularity constrains on the function F, estimates for the weak 
convergenc^ might be computable, see for instance the approach taken in Jacod et al. [22] in 
the case of functions depending on the terminal value of the solution of a SDE. We have chosen 
to follow the approach taken in Dereich [T3] and Dereich and Heidenreich [15] which allows for 
more general functions F with the penalization that we will have only strong estimates. We 
will compute numerically the weak error in the numerical implementation of the algorithm in 



Section 6.1 and show that there is a substantial improvement of the estimates of parameter a. 



We finish this section by comparing the convergence results of Theorem 4.5 with those of 
Dereich and Heidenreich [15J, Dereich [13] and Jacod et al. [22] • Since all these alternative 
methods show a significant dependence on the jump structure of the underlying Levy process, 
let us first recall the definition of the Blumenthal-Getoor index. 



P 



:=inf|/3>0:^ |x|^z/(dx) < ooj G [0, 2] . 



-^Weak convergence refers here to the behaviour of |E[F(Xg(„^„/t))] — E[i^(Xf)]| as n f 00. 
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The higher this index, the more important small jumps are. Levy processes with finite activity 
will always have Blumenthal-Getoor index p = 0. 

The approach in Dereich and Heidenreich [13] is to approximate the underlying Levy process 
via a jump diffusion process, to remove all the small jumps and to replace them by an additional 
linear component. While this method works optimally for processes with jump component of 
bounded variation, i.e. Blumenthal-Getoor index p < 1, the convergence rate degenerates 
rapidly when p gets larger and it becomes arbitrarily bad as p — )■ 2. See Figure [T] for a plot of 
the rate of convergence of this method (denoted DH in the figure). As shown in Dereich |14j . 
a better performance is possible when a Brownian correction is added in addition to the linear 
trend to better take account of the discarded small jumps. This is the curve denoted by D in 
Figure [l] It converges even for p = 2, but the convergence rate in this case decreases to z/~5. 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 



Figure 1: Order of convergence with respect to the Blumenthal-Getoor index. 



Our Wiener-Hopf based methods, WHMC (single-level) and MLWH (multilevel), are 
indifferent to the type of jump structure and thus entirely independent of the Blumenthal- 
Getoor index p. The price to pay for this robustness is that, in the bounded variation regime 
(i.e. for Levy processes without Gaussian component and with p < 1), the multilevel version is 
not quite optimal; recall that for a very general class of Levy processes X and payoff functions 
F the optimal rate of convergence is z/~2 (cf. Creutzig et al. [13] and Dereich [H] for comments 
to that effect). Indeed, for p < 1 and a = 0, the multilevel algorithm performance does not 
match the rate achieved by the methods of Dereich, but it coincides with the rate reported 
in Jacod et al. [22] (depicted as JKMP in Figure [T]). The performance of the single level 
algorithm in the bounded variation regime improves to the rate of MLWH for the general 
case. We do not depict these lines in Figure [1} Let us instead focus on the regime where the 
Blumenthal-Getoor index is p > 1, i.e. the Levy process has unbounded variation paths. In 
this case our single-level algorithm WHMC performs better than the method in Dereich and 
Heidenreich [15] for p > 4/3, but it is not able to beat the improved method in Dereich |14j . 
The multilevel version MLWH, on the other hand, does outperform Dereich [TJ] for p > 8/5 
and seems to be at least as good as the best currently available method for p = 2. Let us 
reiterate though that the class of functions F we have so far analysed is particular in that they 
depend only on the terminal value and the maximum Xt of X, whereas in Dereich [T3j and 
Dereich and Heidenreich [15] more general functionals of the entire path of X are allowed. As 
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mentioned above though, we beheve that our approach can also be adapted to the more general 
scenario considered in those publications, involving SDEs and path dependent functionals F, 
and still retain the same rates of convergence. We leave this issue to a further piece of work. 

In the context of more special functionals, it is also worth recalling the results in Jacod et 
al. [22]. Our multilevel algorithm only matches the performance of their method for p = 2. 
However, the assumptions in ^2] are more restrictive than our assumptions and those in Dereich 
et al. |15] , since only functionals of the terminal value Xt are allowed. Moreover, their functions 
are required to be four times continuously differentiable, highlighting some of the drawbacks 
and advantages when considering weak instead of strong estimates (cf. Remark 4.6). 

The next section is devoted to a description of another Wiener-Hopf-based algorithm, which 
achieves optimal convergence rates up to a logarithmic term for all values of the Blumenthal- 
Getoor index. Despite the fact that we are able to derive theoretical rates of convergence, the 
algorithm requires the ability to sample from a triplet of dependent distributions coming out of 
the Wiener-Hopf factorisation, for which currently no known significant examples are known. 



5 The random time T(n, t) 



A careful look at Section 2.2 reveals that there is nothing special about the random time 



g{n,n/t) in the description of the WHMC method of Theorem 2.1 other than it being a sum 
of independent and exponentially distributed random variables converging almost surely to t. 
Therefore one may investigate more efficient ways to approximate t by a sum of exponentials. 
We now present an alternative random time that exhibits faster convergence rates. 

Let N := {A^^ : s > 0} be a Poisson process independent of X with intensity t/n, where 
t > and n G N, associated with the Wiener-Hopf random walk (p3|). Let Tj^ be the time of the 
j-th arrival in A^, that is to say Tj^ = inf {s > : Ns = j} and define T(n, t) = Tl!^^_^_i. Then, it 
is well known and intuitively clear from the lack of memory property of the exponentially dis- 
tributed inter-arrival times of A^, that T(n, t) —t follows an exponential distribution with mean 
t/n. Moreover, lim„i-oo T(n, t) = t almost surely, and hence (^T(n,t), ^T(n,t)) also c onv erges to 



{Xt,Xt) almost surely. Therefore the same arguments as those found in Theorem 2.1 suggest 
that one may try to approximate {XT{n,t), XT{n,t)) by {V{Nt + 1, n/t), J{Nt + 1, n/t)). Unfortu- 
nately, current knowledge of the Wiener-Hopf factorisation only gives us information about the 
pair (Xe(q),Xe(q) — Xe(g)), rather than the triplet (Xe(g),Xe(q) — Xe(g), e(g)). Therefore we are 
not able to perform a simultaneous exact sample from {V{Nt + 1, n/t), J{Nt + 1, n/t), N^ + l), 
hence the random time T(n, t) cannot be used in a practical implementation with the theory 
developed in Kuznetsov et al. [28j . 

The time horizon T{n,t) leads to a slightly different construction or modification of the 



WHMC estimator described in Theorem 2.1| and eventually a different construction of the 



MLWH estimator and its associated algorithm, but for the sake of simplicity the Monte-Carlo 
estimate of K{F{Xt, Xt)) is still denoted by -F^'c^ written as in (5 ). However, for any i G N, 
we now understand F"'*^*^ as the i-th draw of the random variable 

pn ^ F{V{Nt + l,n/t), J{Nt + l,n/t)) . (25) 



Remark 5.1 A third time horizon which constitutes only a minor variation of T{n,t), is to 
consider the random time T{n,t) := T^^, in which case t — T{n,t) has the same distribution 
as e{n/t) A t. It is an easy exercise to see that all results applying to T(n,t) will hold also for 
T(n,t) and therefore we avoid any further comments on T[n,t). 
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5.1 Thinning and convergence theorems 

For the time horizon T{n,t) the thinning procedure follows analogously to the description in 
Section 3.2 and equations (11) and (12) remain true. The issue of consistency is not entirely 
straight forward however. Following the notation in Section 3.2, let £ be one of the levels in 
the multilevel algorithm and denote by T{n£_i,t) the time horizon produced by thinning the 
Poisson process A^^. We now show that T(n^_i,t) has the same distribution as T(n^_i,t). It is 
clear from the algorithm that 

T(n,,t) <t(n,_i,t) . (26) 



Indeed, due to the coin tossing either the two time horizons in (26) are equal or differ in a 
geometric sum of exponentials. Since T{n£,t) — t = e{ni/t) we note that 



t(n,_i, t)-t = T{n,, t)-t + J2 e.KA) = Yl ^^KA) = e(n^-i/t) = T(n,_i, t)-t , (27) 



i=l 



i=0 



where k is an independent geometric distribution on {0, 1, 2, ■ ■ ■ } with parameter 1/2. We then 
conclude that, as in Section 3.2l F"'^-'^ and have the same law. 

rather than extending the random grid on level £ and using the thinning algorithm together with 
(11) and (12) to produce a coarser sample, we only have to flip a coin once. If it is a head, then 



Equations (26) and i\27h suggest an interesting simplification. Given the pair (Xt 



X 



{X. 



X 



and thus the entire sample F"^'^*) — = 0. 



On the other hand, if it is a tail, we just need to simulate one independent draw (5 
of {X^^n,_,/t),X^(ne.i/t)) and set 



X 
X, 



TK_i,t) 
T(n^_i,t) 



X 



{i) 



V (Xj 



vii) _|_ clV _|_ rv 



(i) _|_ q{i) \ 



(28) 
(29) 



Observe that equations (28) and (29) significantly simplify the multilevel algorithm for the time 
horizon T(n, t). 

Proposition 5.2 Let t > and suppose X satisfies Assumption (Al). Then, for any n G N, 
(i) E[(XT(„,i) - Xtf] ^ n-i and E[|XT(„,t) -Xt\]< n-^l\ 



(%t) E[(Xtm) - Xtf] < and E[|XT(n,t) ~Xt\]< 



n 



-1/2 



Proof. As in the proof of Proposition 4.4, it is enough to prove the second moment results 
in (i) and (ii), since the results for the first moments then follow from Jensen's inequality. 
Thanks to the lack of memory property, it is well known that T(n, t) —t follows an exponential 
distribution with mean t/n. It follows that, for all G N, 



E[(T(n,t)-t)2] 



2t2 



and E[|T(n,t) - t|] = E[T(n,t)-t] 



t 

n 



The desired result now follows by appealing to Lemma 4.3 



(30) 

□ 



As for Theorem 4.5, the next theorem follows immediately from Theorems 4.1 and 4.2 



Proposition 5.2, and the fact that 7 = 1 (cf. Assumption (A3)). 
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Theorem 5.3 Suppose that assumptions (Al)-(A3) are satisfied and that the random time 
horizon is T(n, t). Then, for any i/ G N and under the constraint that the expected value of the 
total computational cost is Oiy) operations, the root mean square error for the WHMC and the 
MLWH method can be bounded respectively by 

Note that the muhilevel algorithm with time horizon T(n,t) achieves optimal convergence 
rate up to a logarithm factor. Recalling the discussion in Section Afl and Figure [T} the above 



methodology would offer optimal convergence throughout the entire range of the Blumenthal- 
Getoor index, thereby beating any of the methods considered there. 



6 Implementation and Numerical Results 

The aim of this section is to provide an example of the application of the WHMC and the new 
MLWH method to demonstrate the feasibility of both algorithms and to confirm the theoretical 
results of the preceding sections. 

We will study Xt and X^, as well as the price of a barrier option assumed to belong to a 
market driven by a member of the /3-class. We use this class because there is an explicit form of 
the Wiener-Hopf factorisation and because, through appropriate parameter choices, it contains 
examples of processes with paths of bounded and unbounded variation. Moreover Kuznetsov et 
al. [2H1 Section 4] argue that a large class of Levy processes can be approximated by a member 
of the /3-class plus a compound Poisson process. It is beyond the scope of this paper to discuss 
the /3-class in a financial framework, but we refer the reader to Ferreiro-Castilla and Schoutens 
[17] and Schoutens and van Damme [35] where the /3-class is studied in relation with other 
popular models in finance and where a comprehensive numerical analysis of the computation 
of the extrema for this family is done. We will simply introduce the /3-class as a ten parameter 
Levy process with triplet (a,(j, H), such that the Levy measure is absolutely continuous with 
density 

n(d^) = ^i (i_e-bi:^)Ai ^i->oi^^ + ^2 (Y3^^1{-<o}da; , (31) 

and > 0, 6j > 0, Cj > and Aj G (0, 3), for i = 1,2. The expressions for the distribution of 
the extrema can be found in Kuznetsov [261 Section 4] and are given by the following infinite 
product representation 

L J 1 _L »^ ii ]_ _|_ 1-z I J ]_ _|_ »^ ii ]^ _|_ 



Co ii) n<-l Cn(g) Q(g) n>l CAl) 

where (oil) {Cn(Q')}n<-i correspond to the negative zeros and Coi^l) and {(n{(l)}n>i cor- 
respond to the positive zeros of zz) + q = 0. The numerical simulation of X^^ and X^^^ 
requires truncating the infinite products above, but it can be shown using [2B1 Theorem 10] 
and [m Section 3.1] that the cost to obtain a sample via this process (to an arbitrary degree of 
accuracy) is independent of the value of q, thus verifying Assumption (A3). In fact this claim 
holds even for the more general family of processes introduced in Kuznetsov et al. [28] known 



as meromorphic Levy processes. From (31), one sees that processes belonging to the /3-class 
have Levy measures which decay exponentially and therefore have finite moments of all orders, 
in particular they verify Assumption (Al) for the entire range of parameters. 
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Figure 2: V(Xg(„^,„^/i) -Xg(„^_^,„^_^/t)) and V(Xg(„^,„^/t)) (left figure), as well as E[|Xg(„^,„^/t) 
Xg{ni_^,ne-i/t)\] and E[|Xg(„^,„^/j)|] (right figure) plotted against rii (using logg-scales). 



For the numerical results below, we choose a = q = = 6j = 1 and set A2 = Ai = A. 
We then study the behaviour and the robustness of our methods for various choices of a and 
A. For instance, if we set A = 0.5 then we have a process of finite activity and thus with 
Blumenthal-Getoor index p = 0. By letting A = 1.5, we have an infinite activity process with 
jump component of bounded variation (i.e. p G (0, 1)), and finally by setting A = 2.5, we have 
an infinite activity process with index p > 1. We are free to chose any value of the Gaussian 
coefficient a. When a = only the case with A = 2.5 will produce a process with paths of 
unbounded variation. When > all processes have paths of unbounded variation. 

The pay-off function in our case study relates to the pricing of barrier options, i.e. we are 
interested in computing 

E[{K - exp(xo + X,))+l^,,^+^^>,^] (32) 

for some K,b > and Xq G M. Barrier options play an important role in the computation of 
exotic options and are the building blocks for more complex derivatives such as Credit Default 



Swaps, see for instance Schoutens and Cariboni [3l]. Even though the pay-off function in (32) 
is not Lipschitz continuous, and thus does not satisfy Assumption (A2), we will see below that 
the theoretically predicted performance is nevertheless achieved. 

The programs were implemented in C-I--I-, in long double precision and executed on a 64- 
bit Intel(R) Xeon(R) CPU at 2.66GHz running 32-bit Linux 2.6.32-41-generic-pae using the 
GNU gcc compiler (version 4.8). Although currently running in a single threaded environment, 
(multilevel) Monte Carlo algorithms are of course ideally posed to be parallelised, as individual 
samples and simulations at different levels are independent of each other. 



6.1 Numerical results 

We start in Figures |2] and |3] by plotting, for various pairs of A and a and for ni = 2ni_i, the mean 
and the variance of (A:g(„^,„^/i)-Xg(„^_^,„^_^/t)) and (Xg(„^,n^/t)-Xg(„^_^,„^_j/t)). For reference we 



also include mean and variance of Xg(^ne,ne/t) and Xg(^ne,nt/t)- We see from the left plots in those 



figures that the convergence of Y{X^ 



ine-i,ni_i/t), 



with respect to is linear as 



predicted in Proposition 4.4 and shghtly better than predicted for V(Xg(„^^„^/i) — Xg( 
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Figure 3: V(Xg(„^_„^/t)^Xg(„^_^,„^_^/t)) and V(Xg(„^,„^/t)) (left figure), as well as E[|Xg(„^,„^/t)- 
^g(n^_i,nf_i/t)|] and E[|Xg(„^_„^/t)|] (right figure) plotted against (using loga-scales) . 

0.5r Or 




Figure 4: V(F"« - F"^-i) and V(F"^) (left figure), as well as strong error E[|F"^ - 

weak error |E[F"'^] — E[F"''^-i]| and EdF"*^!) (right figure) plotted against (using log2-scales) . 



Since the rate of the Cauchy sequence E[|Xg(„^_„^/t) — -^g{n^_i,n^_i/t)|] dictates the rate 
for E[|Xg(„^_„^/t) — Xt\], we can also verify that the convergence of E[|Xg(„^^„^/() — Xt\] and 
E[|Xg(„^ „^/() — X^l] is no worse than 0{i\''^^). In fact, in the absence of a Gaussian com- 
ponent and in the case bounded variation jump activity, the convergence of E[|Xg(„^^„^/t) — 

X^{ni_^,ni_^/t)\\ seems to be even linear in ra^ , which is not surprising given the piecewise lin- 
ear nature of this model. More surprisingly though, the bias for the running maximum seems 
to converge faster than predicted for any choice of A and a. 

The behaviour for our output functional in (32), which violates assumption (A3), is also 
very close to the conclusions in Theorem |4.5[ As the plots in Figure |4] show, the rate of 
decay in V(F"'^ — F"*-i) is approximately 0.36, while Theorem 4.5 claims /3 = 1/2. Similarly, 
E[|F"'^ — behaves roughly like 0{n^^^^), confirming that a = 1/4. We also see that 

V(F"*) remains constant as shown in (13). In the right plot of Figure |4] we also include the 
weak error, i.e. |E[F"^] — E[F"''^-i]|, which is indeed much smaller than the strong error and 
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Figure 5: Left: Average CPU-time (in milliseconds) to compute one sample F"'*^*) plotted 
against n = ni (log2-scales). Right: Total CPU-time (in seconds) for the WHMC and MLWH 
methods plotted against the estimated root mean square error e (log^Q-scales). 



decays much faster (cf. Remark 4.6). 



Finally, in Figure |5] we look at actual CPU times in running our WHMC and MLWH 
algorithms. In the left inset of Figure [s] we study the cost of computing one single sample F"'^*) 
of using the Wiener- Hopf approach, to confirm also numerically Assumption (A3). The 
cost clearly grows linearly with n, independently of any parameters in the Levy process. As 
mentioned in the previous section, this can in fact be proved rigorously. 

Our final plot in Figure [s] (right) shows actual performance results and compares the sin- 
gle level and the multilevel Wiener-Hopf Monte Carlo algorithms. The x-axis describes the 
estimated root mean square error (RMSE) e. The y-axis describes the cost u. In each data 
point of the plot, the finest level in the MLMC algorithm matches the approximation level in 
the WHMC. Next to each vertical pair of points we have labeled that value. This ensures that 
the bias error is the same in both algorithms. The sampling error in the MLMC error is split 
between levels accordingly to [19i. Equation (12)]. Since the weak error is much smaller than 
the variance reduction in this particular example, as seen in Figure |4| the multilevel algorithm 
outperforms the single level version only for relatively small accuracies. However, from Figure |5] 
one also sees that the slopes of the two lines clearly play into the hands of the MLMC. We 
have considered also only one functional so far. The variance curves in Figure [3] suggest that 
for functionals that depend more strongly on the maximum Xt, the MLWH method would be 
faster than the single level method already for nj^ = 2^ or = 2^. 

Finally, we also have not yet experimented with the factor s at which we coarsen the rates 
from one level to the next, i.e. ne = sne-i. We have only studied the case s = 2, while in 
Giles [19] it was shown that the optimal value for standard multilevel Monte Carlo methods 
for SDKs with Gaussian noise is s = 7. In any case, the results in Figure |5] (right) show that 
Wiener-Hopf Monte Carlo techniques allow the pricing of options in a market driven by a Levy 
process to an accuracy of O{10^^) in less than a second. 

Acknowledgements. We are grateful to Kees van Schaik and StefFen Dereich for insightful 
discussions which lead to significant changes in this paper. 
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